Skip to content

Conversation

@andygrove
Copy link
Member

Summary

  • Fix incorrect timestamp timezone handling in the schema adapter on the df52 branch
  • INT96 Parquet timestamps coerced to Timestamp(us, None) by DataFusion were being routed through Spark's Cast expression when the logical schema expected Timestamp(us, Some("UTC")). Spark's Cast treats None-timezone as TimestampNTZ (local time) and applies a timezone conversion, shifting values by the session timezone offset (e.g., -5h45m for Asia/Kathmandu)
  • Route Timestamp -> Timestamp mismatches through CometCastColumnExpr which delegates to spark_parquet_convert, handling this as a metadata-only timezone relabel

Test plan

  • Existing test "SortMergeJoin with unsupported key type should fall back to Spark" in CometJoinSuite should now pass
  • Existing Rust tests (parquet_roundtrip_int_as_string, parquet_roundtrip_unsigned_int) continue to pass
  • CI passes

🤖 Generated with Claude Code

…_convert

INT96 Parquet timestamps are coerced to Timestamp(us, None) by DataFusion
but the logical schema expects Timestamp(us, Some("UTC")). The schema
adapter was routing this mismatch through Spark's Cast expression, which
incorrectly treats None-timezone values as TimestampNTZ (local time) and
applies a timezone conversion. This caused results to be shifted by the
session timezone offset (e.g., -5h45m for Asia/Kathmandu).

Route Timestamp->Timestamp mismatches through CometCastColumnExpr which
delegates to spark_parquet_convert, handling this as a metadata-only
timezone relabel without modifying the underlying values.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@andygrove andygrove marked this pull request as ready for review February 11, 2026 23:06
@andygrove
Copy link
Member Author

CI Comparison: PR #3494 vs df52 baseline (PR #3470)

PR #3470 (df52 baseline) PR #3494 (with this fix)
Failing CI jobs 34 30
Passing CI jobs same set + 4 extra failures same set

Jobs fixed by this PR (fail in #3470, pass in #3494)

  • spark-sql-native_datafusion-sql_core-3/spark-3.5.8
  • spark-sql-native_datafusion-sql_hive-2/spark-3.5.8
  • ubuntu-latest/Spark 3.5, JDK 17, Scala 2.12/native_datafusion [exec]
  • ubuntu-latest/Spark 3.5, JDK 17, Scala 2.12/native_datafusion [sql]

Regressions introduced by this PR (pass in #3470, fail in #3494)

None — every failure in #3494 also exists in #3470.

Summary

This PR is a strict improvement over the df52 baseline: 4 fewer failing CI jobs with zero regressions.


Note

This comment was generated with the assistance of AI (Claude Code) and should be verified independently.

@andygrove andygrove merged commit f0652aa into apache:df52 Feb 12, 2026
81 of 111 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant